Gaussian Mixture Modeling Using Short Time Fourier Transform Features for Audio Fingerprinting
نویسندگان
چکیده
In Audio fingerprinting, a song must be recognized by matching an extracted fingerprint to a database of previously computed fingerprints. One of the key issues in fingerprinting is the generation of fingerprints that provide discrimination among different songs and at the same time invariant to the distorted versions of the same song. In this paper, we evaluate various features such as spectral centroid, spectral bandwidth, spectral flatness measure, spectral crest factor, Renyi’s entropy and Mel−frequency cepstral coefficients under a large number of distortions by modeling them using Gaussian mixture models (GMM). To make the system more robust, we use the distorted versions of the audio for training. However we show that the audio fingerprints modeled using GMM are not only robust to the distortions used in training but also to distortions not used in training. By modeling audio fingerprints with GMM using spectral centroid and spectral flatness measure alone as features, we obtain a recognition performance of 99.5 %.
منابع مشابه
Recognition of Activities of Daily Living Based on Environmental Analyses Using Audio Fingerprinting Techniques: A Systematic Review
An increase in the accuracy of identification of Activities of Daily Living (ADL) is very important for different goals of Enhanced Living Environments and for Ambient Assisted Living (AAL) tasks. This increase may be achieved through identification of the surrounding environment. Although this is usually used to identify the location, ADL recognition can be improved with the identification of ...
متن کاملAudio Segmentation Using Different Time-Frequency Representations
A blind audio source separation technique for mono mixtures, composed of two noise-free instantaneously mixed audio sources, is presented. The separation is done by image segmentation of different time-frequency representations. Therefore a mixture is transformed by the Short-Time Fourier Transform into time-frequency domain. The resulting spectrogram image is used in log-amplitude or normal am...
متن کاملPathologies cardiac discrimination using the Fast Fourir Transform (FFT) The short time Fourier transforms (STFT) and the Wigner distribution (WD)
This paper is concerned with a synthesis study of the fast Fourier transform (FFT), the short time Fourier transform (STFT and the Wigner distribution (WD) in analysing the phonocardiogram signal (PCG) or heart cardiac sounds. The FFT (Fast Fourier Transform) can provide a basic understanding of the frequency contents of the heart sounds. The STFT is obtained by calculating the Fourier tran...
متن کاملNew Filter Structure based on Admissible Wavelet Packet Transform for Text-Independent Speaker Identification
Identical acoustic features like Mel frequency cepstral Coefficients (MFCC)and Linear predictive cepstral coefficients (LPCC) are being widely used for different tasks like speech recognition and speaker recognition, whereas the requirement of speaker recognition is different than that of speech recognition. In MFCC feature representation, the Mel frequency scale is used to get a high resolutio...
متن کاملcentre for digital music An Adaptive Stereo Basis Method for Convolutive Blind Audio Source Separation
We consider the problem of convolutive blind source separation of stereo mixtures. This is often tackled using frequency-domain independent component analysis (FD-ICA), or time-frequency masking methods such as DUET. In these methods, the short-term Fourier transform (STFT) is used to transform the signal into the time-frequency domain. Instead of using a fixed time-frequency transform on each ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005